Vol. 00 no. 00 2009 
Pages 1-7 



Target prediction and a statistical sampling algorithm for 
RNA-RNA interaction 

Fenix W.D. Huang \ Jing Qin \ Christian M. Reidys ^-2* and 
Peter F. StadlerS ^ 

^Center for Combinatorics, LPMC-TJKLC, Nankai University Tianjin 300071, RR. China 

^College of Life Science, Nankai University Tianjin 300071 , RR. CInina 

^Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for 

Bioinformatics, University of Leipzig, Hartelstrasse 16-18, D-04107 Leipzig, Germany. 
'^Max RIanck Institute for Mathematics in the Sciences, Inselstrasse 22, D-04103 Leipzig, Germany 
^RNomics Group, Fraunhofer Institut for Cell Therapy and Immunology, Rerlickstral3e 1 , D-04103 

Leipzig, Germany 

^Inst. f. Theoretical Chemistry, University of Vienna, Wahringerstrasse 17, A-1090 Vienna, Austria 
^The Santa Fe Institute, 1399 Hyde Rark Rd., Santa Fe, New Mexico, USA 
Received on revised on *****; accepted on ***** 



Associate Editor: 



ABSTRACT 

It has been proven that the accessibility of the target sites has 
a critical influence for mlRNA and siRNA. In this paper, we pre- 
sent a program, rip2.o, not only the energetically most favorable 
targets site based on the hybrid-probability, but also a statistical 
sampling structure to illustrate the statistical characterization and 
representation of the Boltzmann ensemble of RNA-RNA interaction 
structures. The outputs are retrieved via backtracing an improved 
dynamic programming solution for the partition function based on 
the approach of Huang et al. (Bioinformatics). The 0{N^) time and 
0(Af'') space algorithm is implemented in C (available from http: 

/ / www . combinatorics . cn/ cbpc/ rip2 . html). 



1 INTRODUCTION 

Noncoding RNAs have been found to have roles in a great variety of 
processes, including transcriptional regulation, chromosome repli- 
cation, RNA processing and modiTcation, messenger RNA stability 
and translation, and even protein degradation and translocation. 
Direct base-pairing with target RNA or DNA molecules is central to 
the function of some ncRNAs (Storz, 2002). Examples include the 
regulation of translation in both prokaryotes (Narberhaus and Vogel, 
2007) and eukaryotes (McManus and Sharp, 2002; Banerjee and 
Slack, 2002), the targeting of chemical modifications (Bachellerie 
et al, 2002), as well as insertion editing (Benne, 1992), transcrip- 
tional control (Kugel and Goodrich, 2007). The common theme in 
many RNA classes, including miRNAs, siRNAs, snRNAs, gRNAs, 
and snoRNAs is the formation of RNA-RNA interaction structures 
that are more complex than simple sense-antisense interactions. 

The hybridization energy is a widely used criterion to predict 
RNA-RNA interactions (Rehmsmeier et al, 2004; Tjaden et al., 
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2006; Busch et al., 2008). It has been proven that the accessibility 
of the target sites has a critical influence for miRNA and siRNA 
(Ameres and Schroeder, 2007; Kertesz et al, 2007; Kretschmer- 
Kazemi Far and Sczakiel, 2003). Although a lot regulatory ncRNAs 
has already been identified, the number of experimentally veri- 
fied target sites is much smaller, which stimulate a great demand 
to restrain the list of putative targets. In its most general form, 
the RNA-RNA interaction problem (RIP) is NP-complete (Alkan 
et al, 2006; Mneimneh, 2007). The argument for this statement is 
based on an extension of the work of Akutsu (2000) for RNA fol- 
ding with pseudoknots. Polynomial-time algorithms can be derived, 
however, by restricting the space of allowed configurations in ways 
that are similar to pseudoknot folding algorithms (Rivas and Eddy, 
1999). The second major problem concerns the energy parameters 
since the standard loop types (hairpins, internal and multiloops) are 
insufficient; for the additional types, such as kissing hairpins, expe- 
rimental data are virtually absent. Tertiary interactions, furthermore, 
are likely to have a significant impact. 

Several circumscribed approaches of target prediction have been 
considered in the literature. The simplest approach concatena- 
tes the two interacting sequences and subsequently employs a 
slightly modified standard secondary structure folding algorithm. 
For instance, the algorithms RNAcof old (Hofacker et al., 1994; 
Bernhart et al, 2006), pairf old (Andronescu et al., 2005), and 
NUPACK (Ren et al, 2005) subscribe to this strategy. The main pro- 
blem of this approach is that it cannot predict important motifs such 
as kissing-hairpin loops. The paradigm of concatenation has also 
been generalized to the pseudoknot folding algorithm of Rivas and 
Eddy (1999). The resulting model, however, still does not gene- 
rate all relevant interaction structures (Chitsaz et al, 2009; Qin 
and Reidys, 2008). An alternative line of thought is to neglect all 
internal base-pairings in either strand and to compute the mini- 
mum free energy (mfe) secondary structure for their hybridization 
under this constraint. For instance, RNAduplex and RNAhybrid 
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(Rehmsmeier et al, 2004) follows this line of thought. RNAup 
(Muckstein et al, 2006, 2008) and intaRNA (Busch et al, 2008) 
restrict interactions to a single interval that remains unpaired in 
the secondary structure for each partner. Due to the highly conser- 
ved interaction motif, snoRNA/target complexes are treated more 
efficiently using a specialized tool (Tafer et al, 2009) however. 
Pervouchine (2004) and Alkan et al. (2006) independently derived 
and implemented minimum free energy (mfe) folding algorithms 
for predicting the joint secondary structure of two interacting RNA 
molecules with polynomial time complexity. In their model, a "joint 
structure" means that the intramolecular structures of each molecule 
are pseudoknot-free, the intermolecular binding pairs are noncros- 
sing and there exist no so-called "zig-zags". The optimal "joint 
structure" can be computed in 0{N'') time and 0{N^) space by 
means of dynamic programming. 

Recently, Chitsaz et al. (2009) and Huang et al. (2009) inde- 
pendently presented piRNA and ripl.O, tools that use dynamic 
programming algorithm to compute the partition function of "joint 
structures", both in 0{N'^) time. Albeit differing in design details, 
they are equivalent. In addition, Huang et al. (2009) identified in 
ripl.O a basic data structure that forms the basis for computing 
additional important quantities such as the base pairing probability 
matrix. However, since the probabilities of hybrid is not sim- 
ply a sum of the probabilities of the exterior arcs which are not 
independent, ripl.O can not solve the probability of a hybrid. 

23.6% 

H'tu] 22.9% 

A A ■ 30 

A— U 
Q— C 
C— G 
C— G 

^aC^uag^^ 
u u 

Ar tA' 
f^Q— UAU*'^ 
Q— U 
C— G 
U— A 

C— G • 50 

Q— C 

r^25.6% 

Q— C 

G C 

Q—Cr 

^ 7° 
C— G I 

C— G ^ 



25.9% 

O— G 
U— A 
O— G 
U— A 
A— U 

'-' u 

Q— C 
U— A 
Q— U 
O— G 



ompA 



113 
J. 



66.9% 



128 



Q— UGAAGGAUUUAAO— GCGUAUUUUGGAUGAUAACGAG 



GCGC 



.CUACUA UUGUUU . C GC G^ 



3' U U U U C— G U U— GUCCCU 
C— G U — A 
G— C U— A 
G— C A— U 
U— A .A— U 
G— C "G— C 
A— U U— A 
G C A G 
C A ° 



MicA 



19.0% 



G |G ,1 
A C[ 



G ^ G 
U C 

G-C 

C-G 

U-A 

G-C RyhB 

3'UUUU-AUUCAUUAUGACC 

U-A 
U-A 
C-G 
G-C 
ACAUU-A 

83.0% ^ ^ 



AG 
A A 



A^U 

u^^4.6% 

A-U 

A-U«— 20 
C-G 
A-U 
C A 
G-C 
C-G 
A-U 
U-A * 
5' A-UGCAAAUUA 



60 



52- 



30 



40 



JauuCq 

i A 

A , , A 

G-CA lCGljg CAU 

^■y 17.4%1 



^-c 

AUAAUAAAG^ CAUAUGC 3 
24.7% t 



Fig. 2. The natural structure of sodB-RyhB (Geissmann and Touati, 
2004), in which the target site are colored in red and the regions 
colored in green are the ones with the first five region-probabilities. 
The target sites R[i,j] of sodB (interacts with RyhB) whose proba- 
bilities larger than 10~^ are showed in Tab. 3.1. 
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Fig. 3. The natural structure of flilA-OxyS (Chitsaz et al., 2009), in 
which the target site are colored in red and the regions colored in 
green are the ones with the first five region-probabilities. The target 
sites R[i, j\ of flilA (interacts with OxyS) whose probabilities larger 
than 10^^ are showed in Tab. 3.1. 



Fig. 1. The natural structure of ompA-MicA (Udekwu et al, 2005), 
in which the target site are colored in red and the regions colored in 
green are the ones with the first five region-probabilities. The target 
sites R[i,j] of ompA (interacts with MicA) whose probabilities 
larger than 10~^ are showed in Tab. 3.1. 

The calculation of equilibrium partition functions and base- 
pairing probabilities is an important advance toward the charac- 
terization of the Boltzmann ensemble of RNA-RNA interaction 
structures. However, this elegant algorithm does not generate any 
structures. However, as Ding and Lawrence (2003) suggested with 



prototype algorithms, the generation of a statistically representa- 
tive sample of secondary structures may provide a resolution to this 
dilemma. 

In contrast to ripl.O, given two RNA sequences, the output 
of rip2.0 consists of not only the partition function, the base 
pairing probability matrix, but also the contact-region probability 
matrix based on the hybrid probabilities via introducing a new com- 
ponent "hybrid" in the decomposition process and a statistically 
sampled RNA-RNA structure based on the probability-matrices. At 
the same time, we decrease the storage space from 4D-matrices and 
2D-matrices to 4D-matrices and 2D-matrices. 



2 



Target prediction and a statisticai sampling algorittim for RNA-RNA interaction 



2 PARTITION FUNCTION 

2.1 Background 

Let us first review some basic concepts has been introduced by Huang et al. 
(2009), see supplement material (SM) for a full-version. 

Given two RNA sequences R and S (e.g. an antisense RNA and its target) 
with N and M vertices, we index the vertices such that Ri is the 5' end 
of R and Si denotes the 3' end of S. The edges of R and S represent the 
intramolecular base pairs. A joint structure, J{R, S, I), is a graph with the 
following properties, see Fig. 4, (B): 
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Fig. 5. From left to light: tights of type o, \/, □ and A. 



1. R, S are secondary structures (each nucleotide being paired with 
at most one other nucleotide via hydrogen bonds, without internal 
pseudoknots); 

2. 7 is a set of arcs of the form RiSj without pseudoknots, i.e., if Ri-^ Sj-^ , 
Ri^Sj^ G I where ii < 12, then ji < j2 holds; 

3. There are no zig-zags, see Fig. 4, (A). 

Joint structures are exactly the configurations that are considered in the maxi- 
mum matching approach of Pervouchine (2004), in the energy minimization 
algorithm of Alkan et al. (2006), and in the partition function approach of 
Chitsaz et al. (2009). The subgraph of a joint structure J(R, S, I) induced 
by a pair of subsequences {Ri, Ri+i, . . . , Rj} and {5^, S^+i, . . . , Si} 
is denoted by Jij.^h,t- In particular, .J{R,S,I) = Ji,N;1,m- We say 
RaRb{SaSi,,RaS},) G J^j ; if and Only if Ra-Ri (SaSj,, i?a S;,) is 
an edge of the graph Jij.^h,t- Furthermore, Jij.^h,t C Ja,b;c,d if ™d 
only if Jij..h,i is a subgraph of Ja.b;c,d induced by {Ri, . . . , Rj} and 
{Sh, ■ ■ ■ , Si}. Given a joint structure, Ja,b;c,d, its tight structure (ts) 



to a multiplicative recursion relation for the partition functions associated 
with the joint structures. From a practical point of view, however, this would 
result in an unwieldy expensive implementation. The reason are the multiple 
break points a, b, c, d, . . . , each of which coiTespond to a nested f or-loop. 

We therefore need a refined decomposition that reduced the number of 
break points. To this end we call a joint structure right-tight (rts), J^^^ ^ 
in Jii,ji;ri,3-i if its rightmost block is a i^ii ,ji ;ri ,si -ts and double-tight 



(dts), 



jDT 

i,j;r,s 



in Jii. 



if both of its left- and rightmost blocks ai'e 



Jij jj;ri,si-ts's. In particular, for the convenient of the computation, we 
assume the single interaction arc as a special case of dts, i.e. Jj^'^.^ ri ri 



■ 



via the 



Rij^Sri . In order to obtain the probability of a hybrid J^^ 
backtracing method used in (Huang et al., 2009), we introduce the hybrid 
structure, J^^ , as a new block item used in the decomposition 

process. We adopt the point of view of Algebraic Dynamic Programming 
(Giegerich and Meyer, 2002) and regard each decomposition rule as a pro- 
duction in a suitable grammar. Fig. 6 summarizes three major steps in the 
decomposition: (I) "interior ai'c-removal" to reduce ts. The scheme is com- 
plemented by the usual loop decomposition of secondary structures, and (II) 
"block-decomposition" to split a joint structure into two blocks. 




Fig. 4. (A): A zigzag, generated by R2S1, RsSg and ^554 (red). (B): 
the joint structure Ji,24;i,23, we color the different segments and tight 
structures in which Ji,24;i,23 decomposes. 



J a' ,b' \c' ,d' is either a single exterior arc R^iS^i (in the case a' = b' and 
c' = d'), or the minimal block centered around the leftmost and rightmost 
exterior arcs ai,cxr, (possibly being equal) and an interior arc subsuming 
both, i.e., J^i y.^i ^1 is tight in J„ b-c d if it has either an arc R^i Ry or 
S^/Sd'ifa'^fi'orc'T^d'. 

In the following, a ts is denoted by Jj'j.fi g- If J a' ,b' -,0' ,d' is tight in 
Ja b-c d, then we call J„ b-c.d its envelope. By construction, the notion of 
ts is depending on its envelope. There are only four basic types of ts, see 
Fig. 5: 



Procedure (a) 
1, „ 



r ~ FFI = Q or ^ or M 



Procedure (b) 
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o : {R^Sh} = J°j.ft_f and i=j,h = l; 



A : ShSe G J. 



A 

i,j;h,e 



and RiRj ^ J 



A 

i,j;h,f 



2.2 Refined decomposition grammar for target 
prediction 

The unique ts decomposition would in principle already suffice to construct 
a partition function algorithm. Indeed, each decomposition step corresponds 



Fig. 6. Illustration of Procedure (a) the reduction of ai'bitrary joint structu- 
res and right-tight structures, and Procedure (b) the decomposition of tight 
structures. The panel below indicates the 10 different types of structural 
components: A, B: maximal secondary structure segments R[i,j], S[r,s]; 
C: arbitrary joint structure Ji,m:1,m', D: right-tight structures Jf^J^^ E: 
double-tight structure Jf^Tj, 5; F tight structure of type v> A or □; G: type 
□ tight structure J^j.^^/, H: type v tight structure J^j.^ /, J: type A tight 



structure J, 
R[i, j] or S[h,e\. 



K: hybrid structure J 



Hy 

i,j;h,t 



L: substructure of a hybrid 



3 



F.W.D. Huang, J. Qin, CM. Reidys, P.F. Stadler 




Fig. 7. The decomposition trees Tj^ ^ ^ for the joint structure Ji,i5;i,8 
according to the grammai' in r ip2.0 (A) and rip 1.0 (B), respectively. 



According to the decomposition rule, a given joint structure decomposed 
into interior arcs and hybrids, see Figure 7 (A). The details of the decom- 
position procedures are collected in SM, Section 2, where we show that for 
each joint structure Ji,jV;i,M we indeed obtain a unique decomposition- 
tree (parse-tree), denoted by Tj^ ^ ^ . More precisely, Tj^ „ , j J^J has 
root Ji,]V;i,A/ and all other vertices correspond to a specific substructure 
of Jl,N;l,M obtained by the successive application of the decomposition 
steps of Fig. 6 and the loop decomposition of the secondary structures. The 
decomposition trees of a concrete example generated according to rip2.0 
and ripl.O is shown in Fig. 7 (A) and (B), respectively. 

Let us now have a closer look at the energy evaluation of Jij;h,i- 
Each decomposition step in Fig. 6 results in substructures whose energies 
we assume to contribute additively and generalized loops that need to be 
evaluated directly. There are the following two scenarios: 
I. Interior Arc removal. The first type of decomposition is focus on decom- 
posing ts which is similar as the approach deduced by Huang et al. (2009). 
Most of the decomposition operations in Procedure (b) displayed in Fig. 6 
can be viewed as the "removal" of an arc (corresponding to the closing pair 
of a loop in secondary structure folding) followed by a decomposition. Both: 
the loop-type and the subsequent possible decomposition steps depend on 
the newly exposed structural elements. W.l.o.g., we may assume that we 
open an interior base pair RiRj . 

For instance, a rts ^ g (denoted by "D" in Fig. 6) we need to 
determine the type of the exposed pairs of both R\p,q\ and S[r, s]. 
Hence each such stnicture will be indexed by two types lies in 
{E, M, K, F}. Analogously, there are in total four types of a hybrid J^^ 



■ r ,Hy,EE .Hy,EK .Hy,KE ^Hy.KK^ 



m\En - ED — 

K K K K K 

/ \ ^ 

K F K F K 

(I) 



K 
K 

/ \ 

K K 
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(II) 



In order to make sure the maximality of a hybrid, the rts's J^J^j^/^ , 
jRT,KE jRT,EK , jRT,EE 
■'i,j;h,t ' •^i,j;h,e i,j\h.l 

whether or not there exists an exterior arc Ri^ Sj^ such that R[i, i\ — 1] and 
S\j,ji — 1] ai'e isolated segments. If there exists, we say rts is of type (B) or 

,, . c- -1 I jRT,KK jRT,KE jRT,EK jRT,EE 

(A), Otherwise. Similarly, a dts, J^ .. 'f^ g. , J. -.J^ ^. , J^ ..;^ ,, or J. -.J^ ^. 

is of type (B) or (A) depending on whether RiSh, is an exterior arc. For 

DT KKB 

instance. Fig. 8 (I) displays the decomposition of j-h e hybrid and 

rts with type (A) and furthermore Fig. 8 (II) displays the decomposition of 
tAT.KKA 

i,r,h,e ■ 

Suppose Jfj^^ e^^^ '^'^ contained in a kissing loop, that is we have either 

Then at least one of the two "blocks" contains the exterior arc belonging to 
E^j^ labeled by K and F, otherwise, see Fig. 8 (I). 

2.3 Examples for partition function recursions 

The computation of the partition function proceeds "from the inside to the 
outside", see equs. (2.2). The recursions are initialized with the energies 
of individual external base pairs and empty secondaiy structures on subse- 
quences of length up to four. In order to differentiate multi- and kissing-loop 
contributions, we introduce the partition functions Q'^ ■ and Q,^ ■. Here, 
Q'^ j denotes the partition function of secondary structures on R\i,f\ or 
Stijj] having at least one arc contained in a multi-loop. Similarly, Q\ j 
denotes the partition function of secondary structures on R\i,j\ or 
in which at least one arc is contained in a kissing loop. Let be 
the set of substructures Ji,j;h,t C Ji,]V;i,A/ such that Jij-^h,i appears in 
Tjj jv 1 Aj 3S an interaction stnicture of type £, e {DT, RT, v, A, □, o} 
with loop-subtypes Y\,Y2 G {M,K, F} on the sub-intervals R[i,j] and 
S[h,Pi, Yij G {A, B}. Let Q\'J]^i^^ denote the partition function of 

j: 



i,j;h,e 

DT KKB 

For instance, the recursion for j.}^ i displayed in Figure 8 (I) is 



equivalent to: 

„i?,T,MK 



E 

11, hi 



-,Hy,KK „iiT,KKA 
'!,ii:h,hi'*eii+l,j;hi+l,£- ' 



-,Hy,KK „JJT,KF 
^i.ii;/i,/il^il + l,j:hl+l,j 



, ^Hy,KK ^J?T,FF 

' ^i,ii;h,hi^il + l,r,hi+l,l ' 



-,Hy,KK ^ilT.FK , ^HyKK 

'i,ii;h,hi^il + l,j;hl + l,f ^i,j;h,f 

(2.1) 



In which, the recursions for J^^^f^^^, ■^I^j;^*' -^Z'^KV ™d J^^-!^i read: 



-)Hy,EE 

^ i.ii;h,hi 



jHy.EK ^ \p nHy>EK -('To+'^G["l,^^ .jj, + (t-hi-l)03), 
SI, hi 

jHy.KE ^ O^^''^^ g-('^0 + '^G^"i,hi,j,i + (i-*l-l)/33). 



il,hl 



-,Hy,KK _ ^Hy.KK 

!l,/ll 



-■2)133) 



(2.2) 



Fig. 8. (I) Decomposition of jf''^}^^ and (II) decomposition of j^^^'^KB 



i,j;h,i 



II. Block decomposition.The second type of decomposition is the splitting 
of joint structures into "blocks". There are two major differences in contrast 
to the method used in Huang et al. (2009). First, we introduce the hybrid 
itself as a new block item in the grammar and furthermore decompose a 
hybrid via simultaneously removing a single exterior arc. Second, we split 
the whole interaction stnicture into blocks via the alternating decompositions 
of a rts and a dts as showed in the Procedure (a) of Fig. 6. 



3 BACKTRACING 

3.1 target prediction 

Given two RNA sequences, our sample space is the ensemble of all the pos- 
sible joint interaction structures. Let denote the partition function which 
sums over all the possible joint structures. The probability measure of a given 
joint structure Ji,jv;i,j\/ is given by 



(3.1) 
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In contrast to the computation of the partition function "from the inside to 
the outside", the computation of the substructure-probabilities are obtained 
"from the outside to the inside" via total probability formula (TPF). That 
is, the longest-range substructures are computed first. This is analogous to 
McCaskill's algorithm for secondary structures (McCaskill, 1990). 

Set J = Ji,]V;i,A/, T = Tj^ f^.j^j,, and let ^j^ j.^i = {J\Ji,j;h.t G 
T} denote the set of all joint structures J such that Jij-^h,i is a vertex in the 
decomposition tree T. Then 



E 



(3.2) 



By virtue of TPF, set 9s be the possible parent-structure of Jij-^hl 
and Pj^ h £>|9i '''^ conditional probability, we have Pj^ ^ = 

Let ^fj^h^e^^ be the probability of SfJ^^^^" ■ For instance, Pf^J.'^g'^ 
is the sum over all the probabilities of substructures Jij-^h,i € « i m 

such that Jij'^h,i G •^f^J-'h^e'^ ' i-^- ^ ^^'^ of type A and R[i,j], S[h,£] are 
respectively enclosed by a multi-loop and kissing loop. Given a component, 
^i'J-h^e^^ (showed in Figure 6), we say another component 
is its parent-component if and only if as a substructure, Ji'J^l^i^^ could 
be a parent structure of J^'J^j^^i^^ in the decomposition tree. Accordin- 
gly, we say is the child-component ^i'J^]^i^^ ■ For instance, 
jRT,KKB is one of the parent-component of J^^'^^^ , see Figure. 8 (I). 

Set ©a be one of the possible parent-component of J^'J^J,^^ . Accordingly, 
we have 



where by definition 



i,i;h,e 



E 



Ji,j:h,t • 



(3.4) 



Furthermore, in the programme, we calculate P^'^.]^^^ t*"^ ^'i ^ 
during the decomposition of 0s. I.e. given V'q^, we have = 
^^Pg^lQ^P©^, where Pj^|@^ denotes the conditional probability of the 
event that given ©s is the parent-component, ei is its child-component. In 
particular, we have ^^'J^]^i^ = ^emlQs some m. 

Since the four subclasses of J,^^.^,^, i.e. Ji,^^^, Jij'^^i, Ji,^!^^ and 
Hy,KK 



■^i j'h'f ^''^ independent, we obtain 



pHy 

i.j;h,e 



pHy.EE 

i,j;h,£ ' 



pHy.EK 



pHy.KE 

i,j;h,e ' 



pHy.KK 

i,r,h,f 



(3.5) 



Given a hybrid .f^ recall the definition target sites are B[i,j] and 
S[h, £]. The probability of a target site R[i, j] is defined by 



ptar 



: ^pHy 



i,j;h,f 



(3.6) 



Analogously, we define P^l^ ^.j . We predict the optimal interaction region 
with maximal probability, i.e. 

P°P* = max,.,.p'-[,_.,. (3.7) 



3.2 Statistically generating interaction structure 

In this section, we generalize the idea of Ding and Lawrence (2003) in order 
to draw a representative sample from the Boltzmann equilibrium distribu- 
tion of RNA interaction structures. The section is divided into two parts. At 
first we illustrate the correspondence between the decomposition grammar, 
i.e. the recursions for partition functions and the sampling probabilities for 
mutually exclusive cases and secondly, we describe the sampling algorithm. 

The calculation of the sampHng probabihties is based on the recuiTences 
of the partition functions since for mutually exclusive and exhaustive cases, 



113,128: 66.9% 


87,89: 25.9 % 


53,55: 25.6 % 


27,27: 23.6 % 


29,29: 22.9% 


39,40: 21.1 % 


27,28: 20.9 % 


67,69: 16.6 % 


115,128: 16.6 % 


36,41: 15.0% 


36,40: 13.0% 


26,28: 12.3% 


67,70: 10.9 % 


55,56: 10.3 % 







Table 1. The target sites R[i,j] of ompA (interacts with MicA) whose 
probabilities larger than 10~^. 



52,60: 83.0% 


15,17: 54.6 % 


38,47: 24.7 % 


15,16: 19.0% 


72,75: 17.4% 


77,78: 16.8 % 


45,47: 14.2 % 


71,74: 13.7 % 


73,75: 12.3 % 


77,81: 11.1 % 


14,17: 11.0% 





Table 2. The target sites R[i,j] of sodB (interacts with RyhB) whose 
probabilities larger than 10^^. 



87,93: 63.8% 


39,48: 50.7 % 


62,64: 43.6 % 


70,72: 39.6 % 


30,30: 28.4% 


70,73: 27.0 % 


39,45: 17.0 % 


87,92: 13.5 % 


40,45: 11.9 % 


63,64: 11.4 % 







Table 3. The target sites R{i,j] of /ft/A (interacts with OxyS) whose 
probabilities larger than 10~^. 



the key observation is that sampling probability for a case is equivalent to 
the contribution to partition function by the case divided by the partition 
function. For instance, again we consider the decomposition of 

pi p2 u 



jRT,MK 



Set P" . , P^ . , P^ ■ ,¥-^ ■ and P-* . be the sampling probabilities 
for all five cases showed in Figure 8 (I) anticlockwise respectively, then we 
have: 

pO _ ri^y.KK i^RT,KKA ,^J?T,MK 

n,il ~ ^i.ii;h.hi^ii+l,j;hi+l,e/^i,j;h,e ' 



^Hy.KK „RT,KF ,„RT,MK 

„Hy,KK „iiT,FF ,„JJT,MK 
^i,ii;h,hi'^il+l,j;hi+l,ll^i,j;h,e' 

^Hy.KK ^RT,FK ,^RT,MK. 
^i,ii;h,hi^il + l,j;hi+l,e'^i,j;h,e' 



pi . 

p2 
'1,31 

p3 



p4 _ ^HyKK ,^RT,MK 

ilJl ~ ^i,i\h,tl^i,j;h,l ' 



Since the probabilities of all mutually exclusive and exhaustive cases sum up 
to 1, we have y;, P° ■ + P^ ■ + P^ ■ + P^ ■ + P'* ■ = 1, 

11,31 '1,31 '1,31 '1,31 '1,31 

which coincides with eqn. (2.1). 

Next we give a description of the sampling algorithm, as a genera- 
lization of Ding and Lawrence (2003), we still take two stacks A and 
B. Stack A stores sub-joint structures and their types ^ in the form of 
{(j, j; h, £; 5)}, such as {i, j\h,l\ RTMK) represents a sub-joint struc- 
ture jf^.f^'^ ■ Stack B collects interior/exterior arcs and unpaired bases that 
will define a sampled interaction structure once the sampling process finis- 
hes. At the beginning, (1, N\ 1, M, arbitrary) is the only element in stack 
A. A sampled interaction structure is drawn recursively as follows: at first, 
start with (1, A'^; 1, M; arbitrary), sample a pair of separated secondary 
structures or a rts {i, N;j, M; iJTEE) according to their sampling proba- 
bilities. In the former case, (1, N; sec) and (1, M; sec) are stored in stack 
A. Otherwise, (1, i - 1; sec), - 1; sec) and {i, N;j, M; RTEE) are 
stored in stack A. Secondly, given a new element in stack A, denoted by 
{{i, j; h, £; ^)}, we draw a particular case from all the mutual exclusive 
and exhaustive cases according to the sampling probabilities and store the 
corresponding sub-joint structures into stack A, and all the interior arc, exte- 
rior arc or unpaired bases sampled in the process will be stored in stack B. 
I.e. after the completion of sampling for a "bigger" joint structure from stack 
A and storage of "smaller" sub-joint structures derived in the former pro- 
cess in stack A, also the storage of the sampled arcs and unpaired bases of 
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stack B, the element in the bottom of stack A is chosen to do the subsequent 
sampling. The who process teiminates when stack A is empty and at the 
same time, a sampled interaction structure formed in stack B. 



4 RESULTS AND CONCLUSIONS 



The complete set of recursions comprises for ts j.^ ^ 

)5^,,„'20 4D-arrays, 



15 4D- 

arrays respectively, for right-tight structures Q 
for dts QEJ^r,s and 20 4D -arrays. In addition, we need the usual 
matrices for the secondary structures R and S, and the above men- 
tioned matrices for kissing loops. The full set of recursions is 
compiled in the SM, Section 3. 
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Fig. 9. (A) The natural structure of sodB-RhyB (Geissmann and 
Touati, 2004). (B) The hybrid-probability matrix generated via 
rip2.0. This matrix represents all potential contact regions of the 
sodB structure as squares, whose area is proportional to their respec- 
tive probability. (C) "Zoom" into the most likely interaction region 
as predicted by rip2.0. All base pairs of the hybrid are labeled by 
their probabilities. 
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